SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 36112: Managing large SAS® data sets that exceed the maximum number of observations

DetailsAboutRate It

Information on Very Large SAS data sets has been added to SAS 9.3. Documentation.

Understanding the Observation Count in a SAS Data File

Extending the Observation Count in a SAS Data File

The information below applies to SAS 9.2 ONLY

Understanding the Maximum Number of Observations

A SAS data set stores the number of observations it has as an integer in the C programming language signed long-data type. This means that the maximum number of observations that can be counted for a SAS data set is limited by the long integer size for the operating environment.

  • In operating environments with a 32-bit long integer, the maximum number is 2**31-1 or approximately two billion observations.
  • In operating environments with a 64-bit long integer, the number is 2**63-1 or approximately 9.2 quintillion observations.

In some 64-bit environments, long integer data types use the 32-bit data model (to maintain compatibility with 32-bit applications). This means that a 64-bit environment could reach the maximum number of approximately two billion observations. OpenVMS for Integrity Servers, 64-bit Windows Itanium, and 64-bit Windows x64 editions use the 32-bit long data model.

What Happens When the Maximum is Reached

For SAS 8 and later, when a SAS data set exceeds the maximum number of observations for the operating environment, continued SAS processing depends on whether the SAS data set has an index or an integrity constraint that uses an index.

  • If the data set has an index or an integrity constraint that uses an index, when an operation attempts to exceed the maximum, the observation is rejected and an error message is issued. For example, if the SAS data set has an integrity constraint that uses an index, when the maximum is reached, this message is issued:
    ERROR: File MYFILES.BIGFILE contains 2G -1 observations and cannot hold more because it contains an index or an Integrity Constraint that uses an index.

    For SAS 9, a SAS data set is never damaged when an operation attempts to exceed the maximum number of operations. However, you must take explicit action to continue processing the SAS data set.

  • If the data set does not have an index or an integrity constraint that uses an index, sequential processing continues and additional observations are accepted. However, the data set cannot store the observation count and does not maintain the observation numbers. Any operation that requires an observation number is not available. There are no messages to indicate that the SAS data set has reached or exceeded the maximum number of observations.

Limited Functionality When the Maximum is Exceeded

Observation Count

When the observation count is no longer maintained, the observation number is represented by a missing value. The following functionality is affected:

  • SAS procedures that return an observation count (such as the PRINT procedure and the CONTENTS procedure) return a missing value for the number of observations. The missing value is represented by a period (.). For example, PROC CONTENTS could display a period in the Observations field, and you might see a message such as:

    Note: The data set MYFILES.BIGFILE has . observations and 56 variables.

  • SAS procedures that depend on the observation count (for example, the SORT procedure or the COMPARE procedure) might return unpredictable results.
  • The FIRSTOBS= and OBS= system and data set options, which specify observation numbers, produce errors and sluggish processing when one or both of the observation numbers exceed the maximum number of observations.
  • Operations that update the observation count cannot be submitted. You cannot reset the observation count by deleting observations.
  • Functionality that utilizes the internal record ID is not available (for example, to use an observation number to locate a record as in POINT or GET commands).

File Compression

When a request to compress a SAS data set for which the observation count is no longer maintained is submitted, the compression percentage cannot be calculated. When the compressed data set is created, SAS does not write a note to the SAS log that indicates the percentage of reduction.

Indexes and Integrity Constraints

If the SAS data set exceeds the maximum number of observations, you cannot create an index or an integrity constraint, even the integrity constraints that do not require an index.

WHERE processing continues to execute. However, optimizing the WHERE expression by using an index is not available.

CEDA Processing

For CEDA processing between operating environments with a 32-bit long integer and a 64-bit long integer, the maximum number of observations is that of the operating environment with the 32-bit long integer. When the maximum is exceeded, CEDA processing stops. In addition, if CEDA tries to open a file that already exceeds the maximum, the open fails.

The following situations produce an error for CEDA processing:

  • When an operating environment with a 32-bit long integer tries to open a SAS data set that was created by an operating environment with a 64-bit long integer, the open fails if the data set exceeds the 32-bit maximum.
  • For output processing, when an operating environment with a 32-bit long integer tries to create a SAS data set for an operating environment with a 64-bit long integer, processing stops if the data set exceeds the 32-bit maximum.
  • When an operating environment with a 64-bit long integer tries to open a SAS data set that was created by an operating environment with a 32-bit long integer, the open fails if the data set exceeds the 32-bit maximum.
  • For output processing, when an operating environment with a 64-bit long integer tries to create a SAS data set for an operating environment with a 32-bit long integer, processing stops if the data set exceeds the 32-bit maximum.

APPEND Procedure

For SAS 8, the APPEND procedure behavior depends on the append method. If the index is updated for each added observation, when the maximum is exceeded, PROC APPEND stops adding observations and issues an error. For the fast append method, which updates the index after all observations are added, when the maximum is exceeded, the observations that are already appended are canceled. The SAS data set is marked as damaged. In interactive mode, an automatic recovery removes all observations that were appended before the maximum was reached, restores the index and integrity constraints, and resets the damaged flag. In batch mode, no automatic recovery occurs. The file remains marked as damaged.

SORT Procedure

When a SAS data set exceeds the maximum number of observations, the SORT procedure completes the operation with the sorting keys properly ordered by the specified variables. However, the relative order of the observations in the output data set can be affected. The relative output order is determined by the EQUALS or NOEQUALS option in the PROC SORT statement. For observations with identical BY variable values, EQUALS maintains the relative order of the observations from the input data set to the output data set. NOEQUALS does not necessarily preserve this order in the output data set. EQUALS is the default.

The relative output order depends on a sequence number, which represents the observation number. The sequence number is stored as a signed long integer at the end of each key. If a SAS data set exceeds the maximum number of observations, the EQUALS option can be disabled depending on the SAS version.

  • In SAS 9, the sequence number is a 64-bit long integer regardless of the operating environment. Therefore, exceeding the maximum number of observations is not an issue and the EQUALS option is not disabled.
  • In SAS 8, the sequence number is a 32-bit long integer in operating environments that use a 32-bit model and a 64-bit long integer in operating environments that use a 64-bit model. The EQUALS option is disabled if there is a chance that the sequence number will exceed the maximum number of observations. If the EQUALS option is disabled, then a warning is issued when the sort completes.

SAS Views

If a SAS data file from which the PROC SQL view is derived has an index or integrity constraint, then the PROC SQL view cannot exceed the maximum number of observations that are allowed for the operating environment.

Recovering From an Exceeded Maximum

Delete the Index or Integrity Constraint

If you encounter an exceeded maximum number of observations and the data set has an index or an integrity constraint that uses an index, you can delete the index or the integrity constraint and continue processing. However, you still incur the limited functionality that occurs when the SAS data set exceeds the maximum number of observations.

You can use the DATASETS procedure or the SQL procedure to delete indexes and integrity constraints.

Recreate the SAS Data Set

If you want to retain your index or integrity constraint, you must recreate the SAS data set by dividing it into two or more SAS data sets.

Consider the SAS® Scalable Performance Data Engine

The SAS Scalable Performance Data (SPD) Engine, which is designed for high-performance data delivery, is an alternative for processing very large data sets. The SPD Engine reads and writes data sets that contain billions of observations. The SPD Engine can deliver data to applications rapidly because it organizes the data into a streamlined file format. For example, the engine reads partitioned data sets, which enables it to use multiple CPUs to perform parallel I/O functions. See SAS Scalable Performance Data Engine: Reference.

Migrate to a 64-Bit Operating Environment

If a 32-bit operating environment reaches the maximum, consider migrating to a 64-bit operating environment that stores integers using the 64-bit data model. For SAS 9.2, the environments include HP UX on Itanium 64-bit platform, HP UX on 64-bit platform, Linux on x64 64-bit platform, AIX UNIX on 64-bit RS/6000, Solaris on SPARC 64-bit platform, Solaris on x64 64-bit platform, and OpenVMS for HP Integrity servers 64-bit platform.

Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemBase SASSolaris for x649.2 TS2M09.3 TS1M0
OpenVMS on HP Integrity9.2 TS2M09.3 TS1M0
Linux for x649.2 TS2M09.3 TS1M0
Linux9.2 TS2M09.3 TS1M0
HP-UX IPF9.2 TS2M09.3 TS1M0
64-bit Enabled Solaris9.2 TS2M09.3 TS1M0
64-bit Enabled HP-UX9.2 TS2M09.3 TS1M0
64-bit Enabled AIX9.2 TS2M09.3 TS1M0
Windows Vista for x649.2 TS2M09.3 TS1M0
Windows Vista9.2 TS2M09.3 TS1M0
Microsoft Windows XP Professional9.2 TS2M09.3 TS1M0
Microsoft Windows Server 2008 for x649.2 TS2M09.3 TS1M0
Microsoft Windows Server 2003 for x649.2 TS2M09.3 TS1M0
Microsoft Windows Server 2003 Standard Edition9.2 TS2M09.3 TS1M0
Microsoft Windows Server 2003 Enterprise Edition9.2 TS2M09.3 TS1M0
Microsoft Windows Server 2003 Datacenter Edition9.2 TS2M09.3 TS1M0
Microsoft® Windows® for x649.2 TS2M09.3 TS1M0
Microsoft Windows XP 64-bit Edition9.2 TS2M09.3 TS1M0
Microsoft Windows Server 2003 Enterprise 64-bit Edition9.2 TS2M09.3 TS1M0
Microsoft Windows Server 2003 Datacenter 64-bit Edition9.2 TS2M09.3 TS1M0
Microsoft® Windows® for 64-Bit Itanium-based Systems9.2 TS2M09.3 TS1M0
z/OS9.2 TS2M09.3 TS1M0
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.